The definition of adequate metrics between objects to be compared is at the core of many machine learning methods (e.g., nearest neighbors, kernel machines, etc.). When complex objects (such as time series) are involved, such metrics have to be carefully designed in order to leverage on desired notions of similarity.
Let us illustrate our point with an example.
The figure above is the result of a -means clustering that uses Euclidean distance as a base metric. One issue with this metric is that it is not invariant to time shifts, while the dataset at stake clearly holds such invariants.
When using a shift-invariant similarity measure (discussed in our {ref}sec:dtw section) at the core of -means, one gets:
This part of the course tackles the definition of adequate similarity measures for time series and their use at the core of machine learning methods.
This section covers works related to Dynamic Time Warping for time series.
Dynamic Time Warping (DTW) (Sakoe and Chiba 1978) is a similarity measure between time series. Consider two time series and of respective lengths and . Here, all elements and are assumed to lie in the same -dimensional space and the exact timestamps at which observations occur are disregarded: only their ordering matters.
In the following, a path of length is a sequence of index pairs .
DTW between and is formulated as the following optimization problem:
where is the set of all admissible paths, i.e. the set of paths such that:
is a sequence of index pairs with and
and
for all , is related to as follows:
In what follows, we will denote as DTW. In this context, a path can be seen as a temporal alignment of time series and the optimal path is such that Euclidean distance between aligned (ie. resampled) time series is minimal.
The following image exhibits the DTW path (in white) for a given pair of time series, on top of the cross-similarity matrix that stores values:
There exists an algorithm to compute the exact optimum for this problem (assuming computation of is ):
def dtw(x, x_prime, q=2):
for i in 1..n:
for j in 1..m:
dist = d(x[i], x_prime[j]) ** q
if i == 1 and j == 1:
gamma[i, j] = dist
else:
gamma[i, j] = dist + min(gamma[i-1, j] if i > 1
else inf,
gamma[i, j-1] if j > 1
else inf,
gamma[i-1, j-1] if (i > 1 and j > 1)
else inf)
return (gamma[n, m]) ** (1. / q)
The basic idea behind this algorithm is that there exists a recurrence relationship between partial DTW computations. More precisely, if we denote by the (at power ) similarity between sequences and (where the notation denotes sequence observed up to time ), then we have:
and is then . In more details:
The dynamic programming algorithm presented above relies on this recurrence formula and stores intermediate computations for efficiency.
Dot product notation
Dynamic Time Warping can also be formalized using the following
notation:
where stores distances at the power and
Dynamic Time Warping holds the following properties:
However, mathematically speaking, DTW is not a valid metric since it satisfies neither the triangular inequality nor the identity of indiscernibles.
The set of temporal deformations to which DTW is invariant can be reduced by imposing additional constraints on the set of acceptable paths. Such constraints typically consist in forcing paths to stay close to the diagonal.
The Sakoe-Chiba band is parametrized by a radius (number of off-diagonal elements to consider, also called warping window size sometimes), as illustrated below:
The Itakura parallelogram sets a maximum slope for alignment paths, which leads to a parallelogram-shaped constraint: